Skip to content

Land analytics-engine PPL integration into main#5431

Closed
ahkcs wants to merge 22 commits into
mainfrom
feature/mustang-ppl-integration
Closed

Land analytics-engine PPL integration into main#5431
ahkcs wants to merge 22 commits into
mainfrom
feature/mustang-ppl-integration

Conversation

@ahkcs
Copy link
Copy Markdown
Collaborator

@ahkcs ahkcs commented May 11, 2026

Summary

Merges the long-running feature/mustang-ppl-integration branch into main, delivering the analytics-engine PPL integration end-to-end. This is the consolidation PR for 22 feature commits accumulated since the branch's last sync.

Note

This PR currently shows merge conflicts against main. The catch-up PR #5430 brings feature/mustang-ppl-integration up to current main and resolves all seven conflicts; once #5430 merges, this PR should be conflict-free and ready for review/merge.

What this delivers

Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths. Headline pieces:

Plus supporting infrastructure:

Full commit list (22)

Compatibility / opt-in

The analytics-engine path is gated by the extendedPlugins extension being installed (#5403 makes the dep optional). For clusters without analytics-engine installed, all PPL/SQL queries continue through the existing v2 / Calcite paths — no behavior change. For clusters with analytics-engine installed, only Parquet-backed indices route through the new path (#5429 — by index setting).

Test plan

  • All 22 feature-branch PRs landed individually with their own test coverage
  • :api / :core / :legacy / :opensearch-sql-plugin / :integ-test:compileTestJava pass locally after merging main
  • CI build passes (will become green once Land analytics-engine PPL integration into main #5430 lands and this PR is rebased / its base catches up)
  • Integration tests on the analytics-engine path still pass
  • Compatibility run on tests.analytics.parquet_indices=true shows no regressions

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ahkcs and others added 22 commits March 26, 2026 16:46
…5267)

* [Mustang] Add query routing and execution handoff for Parquet-backed indices (#5247)

Implements the query routing and AnalyticsExecutionEngine for Project
Mustang's unified query pipeline. PPL queries targeting parquet_ prefixed
indices are routed through UnifiedQueryPlanner and executed via a stub
QueryPlanExecutor, with results formatted through the existing JDBC
response pipeline.

New files:
- QueryPlanExecutor: @FunctionalInterface contract for analytics engine
- AnalyticsExecutionEngine: converts Iterable<Object[]> to QueryResponse
  with type mapping and query size limit enforcement
- RestUnifiedQueryAction: orchestrates schema building, planning,
  execution on sql-worker thread pool, with client/server error
  classification and metrics
- StubQueryPlanExecutor: canned data for parquet_logs and parquet_metrics
  tables for development and testing

Modified files:
- RestPPLQueryAction: routing branch for parquet_ indices
- SQLPlugin: passes ClusterService and NodeClient to RestPPLQueryAction
- plugin/build.gradle: adds :api dependency for UnifiedQueryPlanner

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Align isClientError with RestPPLQueryAction classification

Add missing exception types to isClientError(): IndexNotFoundException,
ExpressionEvaluationException, QueryEngineException,
DataSourceClientException, IllegalAccessException. Matches the full
list in RestPPLQueryAction.isClientError().

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Move stub code into analytics/stub sub-package

Extract StubSchemaProvider, StubQueryPlanExecutor, and StubIndexDetector
into plugin/.../rest/analytics/stub/ package to clearly separate
temporary stub code from production code. RestUnifiedQueryAction now
delegates to these stub classes.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Use ErrorMessageFactory for error responses

Replace hand-crafted JSON error response with
ErrorMessageFactory.createErrorMessage(), matching the standard error
format used in RestPPLQueryAction.reportError().

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Make metrics and response formatting query-type-aware

Use QueryType to select the correct metrics (PPL_REQ_TOTAL vs REQ_TOTAL,
PPL_FAILED_REQ_COUNT_* vs FAILED_REQ_COUNT_*) and LangSpec (PPL_SPEC
vs SQL_SPEC) so this class can serve both PPL and SQL queries when
unified SQL support is added.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Move analytics routing from REST layer to transport layer

Move the analytics index routing check from RestPPLQueryAction into
TransportPPLQueryAction.doExecute(). This ensures the analytics path
gets the same PPL enabled check, metrics, request ID, and inter-plugin
transport support as the existing Lucene path. RestPPLQueryAction and
SQLPlugin are reverted to their original state.

Added executeViaTransport() to RestUnifiedQueryAction which returns
results via ActionListener<TransportPPLQueryResponse> instead of
RestChannel, integrating properly with the transport action pattern.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Add query size limit to RelNode plan instead of post-processing

Add LogicalSystemLimit to the RelNode plan before passing it to the
analytics engine, consistent with PPL V3 (QueryService.convertToCalcitePlan).
This ensures the analytics engine enforces the limit during execution
rather than returning all rows for post-processing truncation.

Remove post-processing querySizeLimit truncation from
AnalyticsExecutionEngine -- the limit is now part of the plan the
executor receives.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Propagate security context to sql-worker thread pool

Wrap scheduled lambdas in both execute() and executeViaTransport() with
withCurrentContext() to capture and restore ThreadContext (user identity,
permissions, audit trail) on the worker thread. Follows the same pattern
as OpenSearchQueryManager.withCurrentContext().

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Move analytics routing after profiling setup

Move the analytics index routing check after QueryContext.setProfile()
and wrapWithProfilingClear(listener). Use clearingListener instead of
raw listener so profiling thread-local state is properly cleaned up
after analytics path execution.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove NPE from isClientError classification

NPE in the analytics path is a server bug (null schema field, missing
row), not bad user input. Removed from client error list. Will sync
this classification with RestPPLQueryAction updates in #5266.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove duplicate REST execution path from RestUnifiedQueryAction

Remove execute(query, queryType, channel), doExecute(), createQueryListener(channel),
recordSuccessMetric(), recordFailureMetric(), reportError(), and related
REST imports. Since routing now goes through TransportPPLQueryAction,
the REST-specific path was unused. Renamed executeViaTransport() to
execute() as the sole entry point.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Mustang] Add explain support and integration tests for analytics path (#5247)

Add explain support for the analytics engine path:

- AnalyticsExecutionEngine.explain(RelNode, ExplainMode, ...): returns
  logical plan via RelOptUtil.toString() with mode-aware SqlExplainLevel
  (SIMPLE/STANDARD/COST). Physical and extended plans are null since the
  analytics engine is not yet available.

- RestUnifiedQueryAction.explain(): new entry point for explain requests,
  delegates to AnalyticsExecutionEngine.explain() with ExplainMode.STANDARD.
  Response formatted via JsonResponseFormatter with normalizeLf().

- TransportPPLQueryAction: routes explain requests for analytics indices
  to unifiedQueryHandler.explain() instead of unifiedQueryHandler.execute().

Integration tests (AnalyticsPPLIT):
- testExplainResponseStructure: verifies JSON shape (calcite.logical,
  calcite.physical=null, calcite.extended=null)
- testExplainProjectAndScan: LogicalProject + LogicalTableScan + table name
- testExplainFilterPlan: LogicalFilter with condition value
- testExplainAggregationPlan: LogicalAggregate with COUNT()
- testExplainSortPlan: LogicalSort

Unit tests (AnalyticsExecutionEngineTest):
- explainRelNode_returnsLogicalPlan
- explainRelNode_errorPropagation

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Add AnalyticsExplainIT with expected output files

Add dedicated explain integration test class following the CalciteExplainIT
pattern: each test compares actual explain JSON against expected output
files in resources/expectedOutput/analytics/. Tests cover simple scan,
project, filter+project, aggregation, sort (with collation propagation
to LogicalSystemLimit), and eval.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix explain unit test to verify actual logical plan content

Use a real Calcite LogicalValues node instead of a mock so
RelOptUtil.toString() produces a deterministic plan. Assert the exact
expected logical plan string instead of just checking non-null.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove explain unit tests in favor of AnalyticsExplainIT

Explain is thoroughly covered by AnalyticsExplainIT with expected output
file comparison (6 tests). Remove redundant explain unit tests and
unused captureExplainListener helper. Execute unit tests unchanged.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove explain tests from AnalyticsPPLIT in favor of AnalyticsExplainIT

Explain is covered by AnalyticsExplainIT with expected output file
comparison (6 tests). Remove redundant explain tests and
extractLogicalPlan helper from AnalyticsPPLIT.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Switch explain tests to YAML format with format-aware listener

- Add YAML format support to createExplainListener() by checking
  pplRequest.getFormat(), matching TransportPPLQueryAction pattern
- Switch AnalyticsExplainIT from explainQueryToString (JSON) to
  explainQueryYaml (YAML) with assertYamlEqualsIgnoreId
- Replace JSON expected files with YAML expected files
- YAML serializer omits null fields (physical/extended), so expected
  files only contain the logical plan

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Use request explain mode instead of hardcoded STANDARD

Pass pplRequest.mode() to analyticsEngine.explain() so the user's
?mode= parameter (simple, standard, cost, extended) is respected.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix normalizeLf inconsistency with existing PPL explain path

Only apply normalizeLf() for YAML format, return response directly for
JSON format. Matches TransportPPLQueryAction.createExplainResponseListener()
which uses normalizeLf for YAML but returns response as-is for JSON.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Add toExplainLevel() to ExplainMode enum

Encapsulate the ExplainMode to SqlExplainLevel mapping in the enum
itself instead of repeating ternary logic in each execution engine.
AnalyticsExecutionEngine now uses mode.toExplainLevel() directly.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove logging from AnalyticsExplainIT

Remove LOG.info calls and Logger field. The YAML comparison assertion
already provides clear output on failure.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove explain coverage comments

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Reuse TransportPPLQueryAction.createExplainResponseListener for explain formatting

Remove duplicate createExplainListener from RestUnifiedQueryAction.
explain() now returns ExplainResponse via ResponseListener, and
TransportPPLQueryAction wraps it with its existing
createExplainResponseListener for format-aware (JSON/YAML) formatting.
This avoids duplicating the format selection logic.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Init CLAUDE.md (#5259)

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add label to exempt specific PRs from stalled labeling (#5263)

* Implement `reverse` performance optimization (#4775)

Co-authored-by: Jialiang Liang <jiallian@amazon.com>

* Add songkant-aws as maintainer (#5244)

* Move some maintainers from active to Emeritus (#5260)

* Move inactive current maintainers to Emeritus

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Remove affiliation column for emeritus maintainers

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* formatted

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix formatting in MAINTAINERS.md

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Co-authored-by: Simeon Widdis <sawiddis@gmail.com>

* Add query cancellation support via _tasks/_cancel API for PPL queries (#5254)

* Add query cancellation support via _tasks/_cancel API for PPL queries

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

* Refactor PPL query cancellation to cooperative model and other PR suggestions.

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

---------

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

* Add Calcite native SQL planning in UnifiedQueryPlanner (#5257)

* feat(api): Add Calcite native SQL planning path in UnifiedQueryPlanner

Add SQL support to the unified query API using Calcite's native parser
pipeline (SqlParser → SqlValidator → SqlToRelConverter → RelNode),
bypassing the ANTLR parser used by PPL.

Changes:
- UnifiedQueryPlanner: use PlanningStrategy to dispatch
  CalciteNativeStrategy vs CustomVisitorStrategy
- CalciteNativeStrategy: Calcite Planner with try-with-resources
  for ANSI SQL
- CustomVisitorStrategy: ANTLR-based path for PPL (and future SQL V2)
- UnifiedQueryContext: SqlParser.Config with Casing.UNCHANGED to
  preserve lowercase OpenSearch index names

Signed-off-by: Chen Dai <daichen@amazon.com>

* test(api): Add SQL planner tests and refactor test base for multi-language support

- Refactor UnifiedQueryTestBase with queryType() hook for subclass override
- Add UnifiedSqlQueryPlannerTest covering SELECT, WHERE, GROUP BY, JOIN,
  ORDER BY, subquery, case sensitivity, namespaces, and error handling
- Update UnifiedQueryContextTest to verify SQL context creation

Signed-off-by: Chen Dai <daichen@amazon.com>

* perf(benchmarks): Add SQL queries to UnifiedQueryBenchmark

Add language (PPL/SQL) and queryPattern param dimensions for
side-by-side comparison of equivalent queries across both languages.
Remove separate UnifiedSqlQueryBenchmark in favor of unified class.

Signed-off-by: Chen Dai <daichen@amazon.com>

* docs(api): Update README to reflect SQL support in UnifiedQueryPlanner

Signed-off-by: Chen Dai <daichen@amazon.com>

* fix(api): Normalize trailing whitespace in assertPlan comparison

RelOptUtil.toString() appends a trailing newline after the last plan
node, which doesn't match Java text block expectations. Also add
\r\n normalization for Windows CI compatibility, consistent with
the existing pattern in core module tests.

Signed-off-by: Chen Dai <daichen@amazon.com>

---------

Signed-off-by: Chen Dai <daichen@amazon.com>

* [Feature] Support graphLookup with literal value as its start (#5253)

* [Feature] Support graphLookup as top-level PPL command (#5243)

Add support for graphLookup as the first command in a PPL query with
literal start values, instead of requiring piped input from source=.

Syntax:
  graphLookup table start="value" edge=from-->to as output
  graphLookup table start=("v1", "v2") edge=from-->to as output

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Spotless check

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Ignore child pipe if using start value

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add graphLookup integration tests per PPL command checklist

- Add explain plan tests in CalciteExplainIT with YAML assertions
- Add v2-unsupported tests in NewAddedCommandsIT
- Add CalcitePPLGraphLookupIT to CalciteNoPushdownIT suite
- Skip graphLookup tests when pushdown is disabled (required by impl)
- Add expected plan YAML files for piped and top-level graphLookup

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Remove brace of start value list

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Apply docs website feedback to ppl functions (#5207)

* apply doc website feedback to ppl functions

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* take out comments

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix json_append example

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix json_append example

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix links

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

---------

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

* feat(api): Add profiling support to unified query API (#5268)

Add query profiling infrastructure that measures time spent in each
query phase (analyze, optimize, execute, format). Profiling is opt-in
via UnifiedQueryContext.builder().profiling(true) and uses thread-local
context to avoid passing profiling state through every method.

Key changes:
- QueryProfiling/ProfileContext for thread-local profiling lifecycle
- UnifiedQueryContext.measure() API for timing arbitrary phases
- Auto-profiling in UnifiedQueryPlanner (analyze) and compiler (optimize)
- UnifiedQueryTestBase shared test fixture for unified query tests
- Comprehensive profiling tests with non-flaky >= 0 timing assertions

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add UnifiedQueryParser with language-specific implementations (#5274)

Extract parsing logic from UnifiedQueryPlanner into a UnifiedQueryParser
interface with language-specific implementations: PPLQueryParser (returns
UnresolvedPlan) and CalciteSqlQueryParser (returns SqlNode).

UnifiedQueryContext owns the parser instance, created eagerly by the
builder which has direct access to query type and future SQL config.
Each implementation receives only its required dependencies:
PPLQueryParser takes Settings, CalciteSqlQueryParser takes
CalcitePlanContext. UnifiedQueryPlanner.CustomVisitorStrategy now obtains
the parser from the context via the interface type.

Signed-off-by: Chen Dai <daichen@amazon.com>

* Fix flaky TPC-H Q1 test due to bugs in `MatcherUtils.closeTo()` (#5283)

* Fix the flaky tpch Q1

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Change to ULP-aware to handle floating-point precision differences

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: qianheng <qianheng@amazon.com>
Co-authored-by: Simeon Widdis <sawiddis@gmail.com>
Co-authored-by: Jialiang Liang <jiallian@amazon.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
Co-authored-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>
Co-authored-by: Chen Dai <daichen@amazon.com>
Co-authored-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
* [Mustang] Enable profiling and migrate to UnifiedQueryParser (#5247)

Task 1: Enable profiling (#5268)
- Add .profiling(pplRequest.profile()) to UnifiedQueryContext.builder()
  in both doExecute and doExplain

Task 2: Migrate to UnifiedQueryParser for index extraction (#5274)
- Replace StubIndexDetector regex with PPLQueryParser AST-based
  extraction: parse query, walk AST to find Relation node, extract
  table name via getTableQualifiedName()
- Delete StubIndexDetector
- isAnalyticsIndex() is now an instance method (needs PPLQueryParser)
- Constructor takes Settings for PPLQueryParser

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Switch to SimpleJsonResponseFormatter for profiling support

Switch from JdbcResponseFormatter to SimpleJsonResponseFormatter so
profiling data is included in the response when profile=true. The
SimpleJsonResponseFormatter calls QueryProfiling.current().finish()
to populate the profile field.

Update test assertions to match SimpleJsonResponseFormatter type names
(PPL_SPEC: INTEGER -> "int", STRING -> "string") and remove status
field check (not included by SimpleJsonResponseFormatter).

Add integration test verifying profile field appears in response.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Use context parser for index extraction instead of standalone PPLQueryParser

Create UnifiedQueryContext upfront in isAnalyticsIndex() and use
context.getParser() for index name extraction. This reuses the
context-owned parser which supports both PPL and SQL, making it ready
for unified SQL support without code changes.

Remove standalone PPLQueryParser field and Settings constructor param.
isAnalyticsIndex() now takes QueryType to create the right context.
extractIndexName() handles UnresolvedPlan (PPL) with a TODO for
SqlNode (SQL) when unified SQL is enabled.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Use AST visitor for index name extraction

Replace manual tree walking (findRelation) with IndexNameExtractor
visitor extending AbstractNodeVisitor. The visitor's visitRelation()
is called automatically by the AST accept/visitChildren pattern,
which handles tree traversal.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Wrap execute and explain with context.measure() for profiling

Wrap analyticsEngine.execute() and analyticsEngine.explain() calls
with context.measure(MetricName.EXECUTE, ...) so execution time is
captured in the profiling metrics. Planning is auto-profiled by
UnifiedQueryPlanner.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix EXECUTE profiling metric by recording inside AnalyticsExecutionEngine

Move EXECUTE metric recording into AnalyticsExecutionEngine.execute(),
between the actual execution (planExecutor + row conversion) and the
listener.onResponse() call. This ensures the metric is written before
SimpleJsonResponseFormatter calls QueryProfiling.finish() to snapshot.

Previously context.measure() was used in RestUnifiedQueryAction, but
finish() was called inside the listener callback (synchronously) before
measure()'s finally block could write the metric, resulting in 0ms.

Add IT assertion that execute phase time_ms > 0 to catch this bug.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Mustang] Wire analytics-engine as extendedPlugins dependency (#5247)

Step 1: Plugin wiring and dependency integration.

- Add analytics-engine as extendedPlugins in plugin/build.gradle
- Vendor analytics-framework JAR (interfaces) and analytics-engine
  ZIP (plugin) built from OpenSearch sandbox/3.6 branch
- Delete local QueryPlanExecutor interface, use upstream
  org.opensearch.analytics.exec.QueryPlanExecutor from JAR
- Replace StubSchemaProvider with OpenSearchSchemaBuilder which reads
  real index mappings from ClusterState
- Delete StubSchemaProvider (no longer needed)
- Exclude shared JARs (Calcite, Guava, commons-*, etc.) from SQL
  plugin bundle to avoid jar hell with analytics-engine classloader
- Load analytics-engine plugin in integTest and remoteCluster test
  clusters before opensearch-sql-plugin
- Create parquet_logs and parquet_metrics indices in ITs so
  OpenSearchSchemaBuilder can resolve the schema
- Update explain expected files for alphabetical field ordering

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Add analytics-engine plugin to all test clusters

Every test cluster that loads opensearch-sql-plugin needs the
analytics-engine plugin because SQL declares it as extendedPlugins.
Added to yamlRestTest, integTestWithSecurity, remoteIntegTestWithSecurity,
and integJdbcTest clusters.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Add analytics-engine plugin to doctest test cluster

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Add commons-text to analytics-engine ZIP and fix isAnalyticsIndex NPE

commons-text is needed by Calcite (parent classloader) for fuzzy matching
but was only in the SQL plugin (child classloader). Also use lightweight
parsing context in isAnalyticsIndex to avoid requiring cluster state.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix Janino classloader issue when analytics-engine is parent plugin

Calcite's EnumerableInterpretable.getBindable() hardcodes
EnumerableInterpretable.class.getClassLoader() for Janino compilation.
When analytics-engine is the parent classloader via extendedPlugins,
this returns the parent classloader which cannot see SQL plugin classes,
causing CompileException for any Enumerable code generation.

Override implement() in OpenSearchCalcitePreparingStmt to use our own
compileWithPluginClassLoader() which does the same code generation but
uses CalciteToolsHelper.class.getClassLoader() (SQL plugin's child
classloader) so Janino can resolve both parent and child classes.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Reverse classloader: make SQL plugin parent of analytics-engine

The previous approach (analytics-engine as parent) caused Janino
classloader issues in 4 Calcite code paths. Reversing the relationship
makes SQL plugin the parent, so Janino uses the SQL plugin classloader
which can see both Calcite and SQL plugin classes.

Changes:
- Remove extendedPlugins=['analytics-engine'] from SQL plugin
- Add ExtensiblePlugin interface to SQLPlugin
- Rebuild analytics-engine ZIP with extended.plugins=opensearch-sql
  and without Calcite JARs (inherited from parent)
- Move OpenSearchSchemaBuilder to analytics-framework JAR
- Change analytics-framework from compileOnly to api in core
- Reverse test cluster plugin load order (SQL first)
- Revert all Janino classloader fixes (no longer needed)
- Add classloader architecture options doc

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove classloader options doc from PR

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove unnecessary Janino classloader workarounds

With SQL plugin as parent (Option C), EnumerableInterpretable and
RexExecutable use the SQL plugin classloader which sees all classes.
The implementEnumerable(), compileWithPluginClassLoader(), and
PluginClassLoaderRexExecutor workarounds are no longer needed.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Revert "Remove unnecessary Janino classloader workarounds"

This reverts commit 23e0d13.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Revert "Remove classloader options doc from PR"

This reverts commit 0ad18a8.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Revert "Reverse classloader: make SQL plugin parent of analytics-engine"

This reverts commit be27381.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Revert "Fix Janino classloader issue when analytics-engine is parent plugin"

This reverts commit 2c4c401.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Patch Calcite CALCITE-3745: use thread context classloader for Janino

Calcite hardcodes EnumerableInterpretable.class.getClassLoader() and
RexExecutable.class.getClassLoader() for Janino compilation. When
analytics-engine is the parent classloader, this returns the parent
which cannot see SQL plugin classes, causing CompileException.

Patch calcite-core to prefer Thread.currentThread().getContextClassLoader()
(CALCITE-3745), and set the context classloader to the SQL plugin's
classloader before Calcite execution in two places:
- CalciteToolsHelper.OpenSearchRelRunners.run()
- CalciteScriptEngine.compile()

This keeps analytics-engine as the natural parent while fixing all
Janino classloader issues with a minimal, well-understood patch.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix CalciteExplainIT boolean string literal explain plan difference

With the patched Calcite classloader, SAFE_CAST('TRUE') is no longer
silently folded to a boolean literal in the logical plan. Split the
expected YAML so boolean literal tests (male = true) and string literal
tests (male = 'TRUE') have separate expected outputs. The physical plan
(term query pushdown) is identical in both cases.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Add setContextClassLoader wrapper to QueryService Calcite paths

The analyze() and convertToCalcitePlan() steps in executeWithCalcite()
and explainWithCalcite() trigger RexSimplify which uses Janino for
constant folding. Without the context classloader set, UDF expressions
like SAFE_CAST('TRUE') can't be folded, causing explain plan differences
vs main branch. Adding the wrapper here covers the entire Calcite
lifecycle for both execute and explain paths.

Revert the separate boolean string literal YAML since all three boolean
tests now produce the same plan (SAFE_CAST folds correctly).

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Refactor context classloader wrapping into CalciteClassLoaderHelper

Extract the CALCITE-3745 context classloader fix into a dedicated
CalciteClassLoaderHelper utility class. Remove redundant wrapper from
OpenSearchRelRunners.run() (already covered by QueryService callers).

Three call sites remain:
- QueryService.executeWithCalcite() — covers planning + execution
- QueryService.explainWithCalcite() — covers planning + explain
- CalciteScriptEngine.compile() — covers pushdown scripts (shard thread)

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Revert CalciteToolsHelper comment to sync with main

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Integrate SQL REST endpoint with analytics engine path

Add SQL query routing through the analytics engine for Parquet-backed
indices. SQL queries targeting "parquet_*" indices are routed to
RestUnifiedQueryAction via the unified query pipeline (Calcite SQL
parser -> UnifiedQueryPlanner -> AnalyticsExecutionEngine).

Changes:
- Add SqlNode table extraction in RestUnifiedQueryAction.extractIndexName()
  to support SQL query routing (handles SqlSelect -> SqlIdentifier)
- Add executeSql() and explainSql() methods in RestUnifiedQueryAction
  for SQL queries (parallel to existing PPL execute/explain)
- Add analytics routing in RestSqlAction via optional BiFunction router
  that checks isAnalyticsIndex before delegating to SQLService
- Wire the router in SQLPlugin.createSqlAnalyticsRouter()
- Non-analytics SQL queries fall through to the existing V2 engine
- Add AnalyticsSQLIT integration tests: SELECT *, column projection,
  explain, non-parquet fallback, syntax error handling

Resolves: #5248

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Use queryType to branch index extraction instead of instanceof

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Use Optional for extractIndexName return type

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Unify execute and explain methods for both PPL and SQL paths

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Reuse base class explainQuery() in AnalyticsSQLExplainIT

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Reuse base class executeQuery() in AnalyticsSQLIT

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove no-arg constructor and null check for analyticsRouter

analyticsRouter is always provided — only one caller exists
(SQLPlugin.getRestHandlers). Remove the backward-compatible
no-arg constructor and the null check.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Extract V2 engine delegation into delegateToV2Engine method

Restore original logging (query anonymization, explain fallback)
that was lost when the fallback logic was inlined into the
analytics router lambda.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Refactor SQL table extraction to use SqlBasicVisitor pattern

Replace instanceof checks with SqlTableNameExtractor visitor,
consistent with how PPL uses AbstractNodeVisitor for index
name extraction.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Move SqlBasicVisitor to import header

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Version bump to OpenSearch 3.7 (Jackson 2 → 3 parser API + _source excludes serialization) (#5361)

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Update vendored analytics JARs from 3.6 to 3.7

Replace libs/ JARs with 3.7.0-SNAPSHOT versions built from the
OpenSearch sandbox. Update build files to reference 3.7 file names.
Keep files() references instead of Maven coordinates so CI can
resolve dependencies without publishToMavenLocal.

Signed-off-by: Ahmed Khatib <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Async QueryPlanExecutor + drop stub and stub-only ITs

The analytics-framework 3.7 QueryPlanExecutor interface is async
(void execute(plan, ctx, ActionListener<Stream>)). Update
AnalyticsExecutionEngine to dispatch via the listener instead of the
old synchronous return-value form, and rework the unit test mocks
likewise.

Drop StubQueryPlanExecutor and its callers (TransportPPLQueryAction,
SQLPlugin, RestUnifiedQueryActionTest); the SQL plugin now expects
the real analytics-engine executor to be supplied via Guice
cross-plugin injection (a small AnalyticsExecutorHolder bridges the
binding to the SQLPlugin REST handler that runs before Node injection).

Drop AnalyticsPPLIT and AnalyticsExplainIT (and their
expectedOutput/analytics fixtures): these tested stub canned data
which no longer exists.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Ahmed Khatib <ahkcs@amazon.com>
opensearch-build-tools requires Gradle 9.4.1+, which currently breaks
the analytics-engine integration build that depends on this branch.
Mirrors opensearch-project/geospatial#854.

Signed-off-by: Kai Huang <huangkaics@gmail.com>
…5409)

The arrow-flight-rpc plugin (a transitive parent of analytics-engine via
extendedPlugins) bundles jsr305-3.0.2.jar, and the SQL plugin pulls in the
same artifact via Calcite's transitive findbugs:jsr305 dependency. With both
plugins shipping the same jar, OpenSearch's PluginsService.checkBundleJarHell
fails at install time with:

  IllegalStateException: jar hell!
  class: javax.annotation.CheckForNull
  jar1: .../installing-.../jsr305-3.0.2.jar
  jar2: .../plugins/arrow-flight-rpc/jsr305-3.0.2.jar

Same pattern as #5400 — exclude the jar from the SQL bundle and rely on the
copy already provided by arrow-flight-rpc through the shared classloader.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Carry CalciteEvalCommandIT through the helper-managed index path

The IT's init() previously created the in-test test_eval index via direct
`PUT /test_eval/_doc/N` requests, relying on dynamic mapping. Two
problems with that:

  1. The doc PUTs auto-create the index with whatever settings the
     cluster defaults to. The analytics-engine compatibility path
     (force-routing on; tests.analytics.parquet_indices=true) needs
     parquet-backed indices, which TestUtils.createIndexByRestClient
     applies via TestUtils.makeParquetBacked when the system property
     is set. Direct PUTs sidestep that helper, so test_eval lands as
     Lucene-backed and the analytics planner rejects it with
     "No backend can scan all requested fields on index [test_eval]".
     Three of the four tests fail at execution; the fourth uses BANK
     (loadIndex'd) and passes.

  2. init() runs before every @test method. The doc PUTs are doc-level
     idempotent, so re-running was wasteful but not failing. Once we
     switch to createIndexByRestClient, the index-level PUT is no longer
     idempotent and re-running throws "resource_already_exists_exception".

Both addressed in one change:

  - test_eval is created via TestUtils.createIndexByRestClient with an
    explicit mapping (name/title=keyword, age=long). The helper honours
    tests.analytics.parquet_indices=true and produces a parquet-backed
    index for the analytics-engine sweep; on the v2 path the helper
    is a no-op around the index PUT, so behaviour is unchanged.

  - The whole init body is guarded by TestUtils.isIndexExist — same
    idempotency idiom that loadIndex uses for predefined fixtures. First
    @test method provisions; subsequent methods skip.

Also pins the projection order on testEvalStringConcatenation. The
original query (`source=test_eval | eval greeting = 'Hello ' + name`)
had no `| fields` clause and relied on the implicit projection's column
order — v2 returns Lucene-source insertion order, analytics returns
parquet-storage order (alphabetical), so the assertion only matched on
v2 by coincidence. Adding `| fields name, title, age, greeting` makes
the assertion deterministic across paths; the existing expected rows
(`rows("Alice", "Engineer", 25, "Hello Alice")`) already match this
order, so v2 behaviour is preserved.

No semantic change for the v2 path: the explicit mapping types
(keyword, long, keyword) resolve to the same PPL types ("string",
"bigint", "string") that dynamic mapping inferred, and eval reads from
_source either way. Analytics-route compatibility goes from 1/4 to 4/4
once the corresponding analytics-engine wiring lands in core.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Exclude @ignore'd test classes from integTest as Gradle 9.4.1 workaround

Gradle 9.4.1 has a ClassCastException in TestEventReporterAsListener.started
(line 58 of org/gradle/api/internal/tasks/testing/junit/result/TestEventReporterAsListener.class
inside plugins/gradle-testing-base-9.4.1.jar):

  ((GroupTestEventReporterInternal) reportersById.get(parent.getId()))
      .reportTestDirectly(...)

The bridge stores test descriptor reporters keyed by id and casts the
parent's stored reporter to GroupTestEventReporterInternal when a child
event arrives. A class-level @ignore produces a non-composite parent
descriptor with a leaf TestEventReporter (not a group reporter), so the
cast fails. The first failure in CI was on
CalciteInformationSchemaCommandIT.testTablesFromPrometheusCatalog —
that IT is @ignore'd at the class level pointing at issue #3455.

Once any @ignore'd class is loaded by the test runner, the cast aborts
the whole integTest task with "Test process encountered an unexpected
problem", even though no actual test assertion failed and the tests
themselves would have been skipped at the JUnit layer.

The bridge is instantiated by Gradle's own AbstractTestTask (verified
via javap of `gradle-testing-base-9.4.1.jar` — `new TestEventReporterAsListener`
at AbstractTestTask:263), so we cannot prevent its registration from
user code. The only available remedy until Gradle ships a fix is to
keep @ignore'd classes off the test runner's input set entirely.

Net behaviour for CI:
  - @ignore'd tests were already skipped at the JUnit layer; skipping
    them at the Gradle layer instead changes "skipped" to "not run".
    For metrics/dashboards that count skipped tests, this is a one-time
    minor delta. For pass/fail status, identical.
  - Pre-existing exclude block (`OrderIT`, `ExplainIT`, etc.) already
    excludes some of the same classes; this commit covers the remaining
    @ignore'd ones. OrderIT not duplicated.
  - integTest task no longer aborts mid-run; per-test FAILED messages
    (testLogging.events "failed") are preserved.

Remove this block once Gradle 9.4.x bug is fixed upstream. The list
mirrors `grep -rln '^@ignore' integ-test/src/test/java` minus
already-excluded paths.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Default plugins.calcite.enabled=true on the unified query path

The unified query handler (`RestUnifiedQueryAction` → `TransportPPLQueryAction`
→ analytics-engine) builds its `UnifiedQueryContext` with an empty `Settings`
map; nothing wires the cluster setting through. PPL parsing is delegated to
the same `AstBuilder` the v2 path uses, and that builder gates `table` (and
other Calcite-only commands) on `Settings.Key.CALCITE_ENGINE_ENABLED`. With
no setting propagated, the gate sees `null`, fails the `Boolean.TRUE.equals`
check, and throws

  UnsupportedOperationException: Table command is supported only when
  plugins.calcite.enabled=true

even when the cluster setting is true, blocking every `table` query routed
through the analytics path under `tests.analytics.force_routing=true`.

The unified path is by definition Calcite-based — every query reaching
`UnifiedQueryContext` flows through Calcite's planner. Default
`CALCITE_ENGINE_ENABLED=true` in `buildSettings()` when the underlying map
doesn't have the key. This unblocks `table` and any other AstBuilder gate
that defends the same toggle, without changing v2 behavior (v2 constructs
`AstBuilder` with cluster `Settings`, not the unified context).

Also refresh the PPL analytics-engine routing dev doc to document the
unified-context dependency and switch the per-command verification recipe
from the bundle-branch `analyticsCompatibilityTest` task to the standard
`integTestRemote -Dtests.analytics.{force_routing,parquet_indices}=true` —
the bundle-branch task is purely for the daily coverage-report sweep.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Drop dev docs from this PR — track separately

These local routing-and-coverage notes are useful but belong in their own
review thread (or stay as untracked working notes), not in the
unified-context fix. Keeping them on disk via .gitignore-style untrack so
the PR stays focused on the single api/ change.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Address @dai-chen: add CALCITE_ENGINE_ENABLED to default settings map

@dai-chen suggested either adding the setting to the default list at
UnifiedQueryContext.java:123 or passing it via the .setting() API on the
builder, since only the unified PPL path requires it.

Going with option 1 (default-list entry) — it composes the same way every
other planning-required default does, and avoids forcing every caller (the
production REST handler and every IT) to remember a single magic .setting()
call. The IT at UnifiedQueryOpenSearchIT.java:51 was the precedent for the
.setting() approach but only one IT needs it; the production caller
(RestUnifiedQueryAction) wouldn't have a natural place to wire this without
either repeating it everywhere or routing through a helper.

Removes the conditional in buildSettings().getSettingValue() — the default
map now carries the value the same way QUERY_SIZE_LIMIT, PPL_SUBSEARCH_MAXOUT,
and PPL_JOIN_SUBSEARCH_MAXOUT do.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* CalcitePPLRenameIT: switch to column-name-aware row matcher

Previously, {@code verifyStandardDataRows} (and a handful of bespoke
{@code verifyDataRows} calls in this file) compared rows positionally —
they assumed the engine emits columns in the {@code _source} iteration
order the v2 / Lucene path produces. Under the analytics-engine route the
parquet-backed reader returns columns in storage order, so the same 4
canonical state_country rows came back as {@code [70,"USA",4,"Jake",
"California",2023]} instead of {@code [Jake,USA,California,4,2023,70]}
and 12 of 22 tests failed despite the data being identical.

Replace with a column-name-keyed expected-row map. The helper reads the
actual schema from the response, looks up each canonical value by column
name, places it at the corresponding schema position, then defers to
{@code verifyDataRows} as before. The contract is identical to the
existing {@code verifySchema} matcher — both are set-equality on column
names — so the test no longer leaks the engine's emission order into
the assertion.

Each call site passes the canonical column-name list (with rename
substitutions where applicable). Tests that don't rename age keep
calling the no-arg form. Both paths now pass:

* Analytics-engine route (`tests.analytics.force_routing=true`): 20 / 22
  (the remaining 2 are `testRenameInAgg` /
  `testRenameWithBackticksInAgg`, blocked on the Substrait isthmus
  AVG-binding follow-up tracked in
  opensearch-project/OpenSearch#21521).
* v2 / Calcite route (default routing): 20 / 22 (same two tests fail with
  `NoClassDefFoundError: LevenshteinDistance` only when running against
  the OS-core `:run` cluster — that bundle is missing commons-text.
  CI's own integ-test cluster bundles commons-text via the SQL plugin's
  classloader and isn't affected.)

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Mirrors the trailing portion of #5414 onto
feature/mustang-ppl-integration. The Gradle 9.4.1 wrapper bump (#5406)
and the integTest exclude block for @ignore'd classes (#5407) already
landed on this branch, but the matching cleanup of `CalciteNoPushdownIT`
did not — the @suite still listed four classes that are now build-time
excluded from integTest:

  - CalciteInformationSchemaCommandIT
  - CalciteJsonFunctionsIT
  - CalcitePrometheusDataSourceCommandsIT
  - CalciteShowDataSourcesCommandIT

Leaving them in the suite means the JUnit runner attempts to load each
class, finds it referenced from the suite, and registers a descriptor —
which (combined with the class-level @ignore) reproduces the same
TestEventReporterAsListener.started cast that the integTest exclude was
meant to avoid. Dropping the entries here keeps the no-pushdown suite
consistent with the exclude block: a class excluded at the build layer
must not also appear in any @Suite.SuiteClasses list.

No new tests are introduced or removed — the four classes are already
@ignore'd and have been since they were written. This change is purely
about keeping the no-pushdown suite metadata aligned with the exclusion
pattern carried over from main.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
…ne route (#5415)

The analytics-engine route and the v2 / Lucene path return columns in different
orders when there is no explicit `| fields ...` clause: parquet preserves the
storage order chosen by the on-disk format, while the Lucene path preserves
`_source` iteration order. Both are valid given the contract `verifySchema`
declares (set equality on column names), so positional `verifyDataRows`
assertions over-constrain the test and fail under
`-Dtests.analytics.force_routing=true` even when the data is correct.

Apply the same column-name-keyed match pattern Kai introduced for
`CalcitePPLRenameIT` in 59c728b (#5413):

  * Add `rowOf(key1, val1, ...)` to build column-keyed expected rows.
  * Add `verifyDataRowsByColumn(...)` to look up each cell value by column
    name and reorder to match the response schema before delegating to the
    existing positional `verifyDataRows` matcher.
  * Convert the four order-sensitive tests
    (`testMultipleReplace`, `testEmptyStringReplacement`,
    `testMultipleFieldsInClause`, `testMultiplePairsInSingleCommand`) to
    the new helpers.
  * Make `testReplaceNonExistentField` order-agnostic on the
    `input fields are: [...]` field list — assert that the prefix and every
    expected field name appear in the message, but not in a fixed order.

Test results against analytics-engine route via
`-Dtests.analytics.{force_routing,parquet_indices}=true`: 21/21 pass in both
the direct `CalciteReplaceCommandIT` suite and the `CalciteNoPushdownIT >
CalciteReplaceCommandIT` re-run. v2 path remains green.

Companion to the OpenSearch PR onboarding PPL `replace` command +
`replace()` / `regexp_replace()` functions on the analytics-engine route
via DataFusion `replace` / `regexp_replace` UDFs.

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
…th (#5417)

Same shape as #5407 for CalciteEvalCommandIT. The IT's init() previously
created the in-test test_eval index via direct `PUT /test_eval/_doc/N`
requests, relying on dynamic mapping. Two problems:

  1. The doc PUTs auto-create the index with whatever settings the cluster
     defaults to. The analytics-engine compatibility path (force-routing on;
     tests.analytics.parquet_indices=true) needs parquet-backed indices,
     which TestUtils.createIndexByRestClient applies via
     TestUtils.makeParquetBacked when the system property is set. Direct
     PUTs sidestep that helper, so test_eval lands as Lucene-backed and
     the analytics planner rejects it with "No backend can scan all
     requested fields on index [test_eval]". All four working tests fail
     at execution.

  2. init() runs before every @test method. The doc PUTs are doc-level
     idempotent, so re-running was wasteful but not failing. Once we
     switch to createIndexByRestClient, the index-level PUT is no longer
     idempotent and re-running throws "resource_already_exists_exception".

Both addressed in one change:

  - test_eval is created via TestUtils.createIndexByRestClient with an
    explicit mapping (name/title=keyword, age=long). The helper honours
    tests.analytics.parquet_indices=true and produces a parquet-backed
    index for the analytics-engine sweep; on the v2 path the helper is a
    no-op around the index PUT, so behaviour is unchanged.

  - The whole init body is guarded by TestUtils.isIndexExist — same
    idempotency idiom that loadIndex uses for predefined fixtures. First
    @test method provisions; subsequent methods skip.

Also pins the projection order on testFieldFormatStringConcatenation. The
original query (`source=test_eval | fieldformat greeting = 'Hello ' +
name`) had no `| fields` clause and relied on the implicit projection's
column order — v2 returns Lucene-source insertion order, analytics returns
parquet-storage order (alphabetical), so the assertion only matched on v2
by coincidence. Adding `| fields name, title, age, greeting` makes the
assertion deterministic across paths; the existing expected rows
(`rows("Alice", "Engineer", 25, "Hello Alice")`) already match this
order, so v2 behaviour is preserved.

The other four tests already had explicit `| fields ...` clauses, so no
change there.

No semantic change for the v2 path: the explicit mapping types (keyword,
long, keyword) resolve to the same PPL types ("string", "bigint", "string")
that dynamic mapping inferred, and fieldformat reads from _source either
way.

Analytics-route compatibility goes from 1/5 to 4/5 (verified locally
against a runTask cluster with analytics-engine + opensearch-sql-plugin).
The remaining `testFieldFormatStringConcatenationWithNullFieldToString`
needs a `tostring()` UDF on the analytics path — a multi-mode UDF (binary
/ hex / commas / duration) tracked separately as out of scope.

Test plan:
- ./gradlew :integ-test:integTest --tests
  'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT'
  -> 5/5 green (v2 path, no regression).
- ./gradlew :integ-test:analyticsCompatibilityTest --tests
  'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT'
  -> 4/5 pass; the 5th fails on `tostring`'s missing capability registration,
  which is the documented out-of-scope category.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
sql plugin previously declared analytics-engine as a hard dependency:

    extendedPlugins = ['opensearch-job-scheduler', 'analytics-engine']

Installing opensearch-sql on a distribution that doesn't ship
analytics-engine failed with:

    Missing plugin [analytics-engine], dependency of [opensearch-sql]

Marking the dep ;optional=true alone is not enough — TransportPPLQueryAction
Guice-injects QueryPlanExecutor on its constructor, and Guice's OpenSearch
fork rejects a required constructor parameter whose binding is missing at
injector-build time ("constructors cannot be optional").

Move QueryPlanExecutor from a required constructor parameter to an
@Inject(optional=true) setter. Guice invokes the setter only when a binding
for QueryPlanExecutor<RelNode, Iterable<Object[]>> exists — i.e. when
analytics-engine's createGuiceModules has run and bound DefaultPlanExecutor.
Absent analytics-engine, the setter is silently skipped,
unifiedQueryHandler stays null, and all PPL queries route to the v2
Calcite-to-OpenSearch path already in the sql plugin.

Drop the bundlePlugin exclude list. OpenSearch's jar-hell check skips the
extended-plugin cross-check when the dep is marked optional
(PluginsService.java:763), so sql can bundle every jar it needs to run
self-sufficiently. When analytics-engine is installed, parent-first
classloader delegation still lets analytics-engine's copies win for any
shared class; sql's bundled copies sit idle.

Promote analytics-framework.jar from compileOnly to api so
QueryPlanExecutor is reachable from sql's own classloader when the plugin
is absent. analytics-engine.jar stays compileOnly (required only for
OpenSearchSchemaBuilder, which lives in the engine plugin and is reached
only through RestUnifiedQueryAction — a class that never loads when the
setter is skipped).

Validated on a live 2-node cluster in both configurations:

- With analytics-engine installed: legacy and analytics PPL both return
  expected rows; routing to the analytics path still fires for
  parquet_-prefixed indices.
- Without analytics-engine (only opensearch-job-scheduler + opensearch-sql
  installed): cluster starts cleanly, PPL and SQL queries against Lucene
  indices return expected rows, parquet_-prefixed lookups return a clean
  IndexNotFoundException instead of a NullPointerException or
  NoClassDefFoundError.

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
…5397)

Single squashed commit on top of feature/mustang-ppl-integration that
absorbs upstream/main's commits not yet on the feature branch. Replaces
the prior catch-up squash (#5396 base + the original af831d3 rebase
commit) so this PR is a fast-forward into feature/mustang-ppl-integration.

Squashed (rather than a merge commit) because upstream main commits
were authored by many contributors with inconsistent or missing
Signed-off-by trailers; DCO would otherwise reject those commits.

Main commits absorbed (54 since divergence; 4 since the original
catch-up squash was made on 2026-04-30):
- #5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec)
- #5408 (datetime type normalization)
- #5414 (Gradle wrapper bump + @ignore exclusion)
- #5399 (FGAC-scoped SQL cursor continuation)
- #5394 (SQL Vector Search), #5361 (OpenSearch 3.7), #5360 (unified SQL
  language spec), #5240 (PPL Union), and 46 others.

Conflict resolutions:

api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec,
UnifiedSqlSpec): took main. Main is a strict superset — adds
postAnalysisRules and preCompilationRules extension points, the new
FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/
date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension
wiring on SQL spec. RELEVANCE category is preserved unchanged.

api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java:
took main. Both adopt the new postAnalysisRules / preCompilationRules
hooks introduced in #5408 / #5419.

core/executor/QueryService.java: composed both sides — kept feature's
CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's
StageErrorHandler stage tracking; both improvements are orthogonal.

legacy/plugin/RestSqlAction.java: took feature. The 3-way merge
produced a duplicated handleException/getRawErrorCode block; feature
already contained both the delegateToV2Engine refactor and the
ErrorReport unwrap from main, so feature is the correct superset.

CLAUDE.md, docs/user/ppl/functions/condition.md: took main.

explain_streamstats_global{,_null_bucket}.yaml: took main (post-#5359
shape).

core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation
utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into
PlanUtils.findInputCollation).

integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT.

ppl/antlr/OpenSearchPPLParser.g4: added unionCommand.

ppl/calcite/CalcitePPLStreamstatsTest.java: added
testMultipleStreamstatsWithWindow.

integ-test/build.gradle: took feature. Both sides added the same
@ignore exclusion block; feature has alphabetical ordering and a more
detailed comment explaining the Gradle 9.4.1 cast bug.

integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took
feature's helper-managed test_eval provisioning (createIndexByRestClient
+ isIndexExist guard, from #5407) so analytics-engine compatibility runs
get a parquet-backed index. Added the test_eval_agent setup (needed by
the dotted-path eval tests for #5351) wrapped in its own isIndexExist
guard for the same parquet-aware idempotency.

plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java:
took feature. PR #5403 made analytics-engine an optional dependency by
moving QueryPlanExecutor from a required constructor parameter to an
@Inject(optional=true) setter, and removed the loadExtensions /
EngineExtensionsHolder / executionEngineExtensions plumbing. Feature
retains the createSqlAnalyticsRouter method this catch-up introduced.

plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced
post-#5403; not present on feature).

Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava +
:integ-test compileTestJava all pass; unit tests pass; spotlessCheck
clean.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Default empty array() return type to ARRAY<VARCHAR>

PPL's `array()` no-arg form delegates to Calcite's `SqlLibraryOperators.ARRAY`
return-type inference, which returns ARRAY<NULL> for an empty operand list and
ARRAY<UNKNOWN> when all operands are typeless nulls. Both markers are fine for
the v2 engine — `ArrayImplementor.internalCast` only consumes the element type
when there are elements to cast, so an empty result Object list flows straight
through to ExprCollectionValue regardless of declared element type.

The analytics-engine route is stricter. When isthmus walks a RexCall like
`mvjoin(array(), '-')`, it reaches its first operand's type and feeds it to
`io.substrait.isthmus.TypeConverter.toSubstrait`, which throws
`UnsupportedOperationException: Unable to convert the type UNKNOWN`. Substrait
has no on-wire encoding for NULL/UNKNOWN element types, so the planner can't
serialize the call at all. Two PPL ITs hit this directly:

  * `CalciteArrayFunctionIT.testMvjoinWithEmptyArray`
  * `CalciteArrayFunctionIT.testMvdedupWithEmptyArray`

Substituting VARCHAR when the inferred element type is NULL or UNKNOWN gives
the call a substrait-serializable type without affecting any value
computation: the result list is empty either way.

# Test plan

* Unit tests: `:core:test --tests "*ArrayFunction*"` — passes locally (no
  existing tests asserted on the empty-array element type).
* IT: `CalciteArrayFunctionIT` force-routed through the analytics-engine path
  via opensearch-project/OpenSearch#21554's plugin set —
  testMvjoinWithEmptyArray and testMvdedupWithEmptyArray now pass (were UNKNOWN
  type errors); pass-rate moved 26/60 → 28/60.

Companion to opensearch-project/OpenSearch#21554.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Add unit tests for empty/UNKNOWN ARRAY → VARCHAR fallback

Cover the four shapes that exercise the return-type inference path
introduced in 666dc0e:

  * array()                  — 0 operands, fallback fires → ARRAY<VARCHAR>
  * array(NULL)              — typeless-null operand, fallback fires → ARRAY<VARCHAR>
  * array(1)                 — INTEGER operand, fallback does NOT fire → ARRAY<INTEGER>
  * array('a', 'b')          — VARCHAR operands, fallback does NOT fire → ARRAY<VARCHAR>

The third case is the regression guard requested by review — confirms
concrete element types pass through unchanged and the fallback is scoped
strictly to the {@code NULL}/{@code UNKNOWN} markers.

The harness uses Calcite's {@link ExplicitOperatorBinding} bound to
{@link SqlLibraryOperators#ARRAY} so the inference's internal
{@code SqlLibraryOperators.ARRAY.getReturnTypeInference().inferReturnType(...)}
call resolves the same operator the production code delegates to —
mocking {@code SqlOperatorBinding} directly hits NPEs deep inside
Calcite's least-restrictive-type computation.

Addresses review feedback on #5421.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Trim inline comment per review feedback

The "why" lives in the PR description; the inline comment now points
there instead of duplicating it. Addresses dai-chen's review note on

Signed-off-by: Kai Huang <ahkcs@amazon.com>
#5421.

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
…5423)

* [Calcite] Make containsNestedAggregator tolerate flat-leaf schemas

  The check splits each aggregator argument on '.' and looks up the
  leading token as a top-level ARRAY column (the OpenSearch `nested`
  marker). It used relBuilder.field(root), which throws when the column
  doesn't exist.

  That works for the classic path, which always exposes a top-level
  column for object/nested parents. It breaks the analytics-engine
  path, which emits only flat leaves ("city.name", "city.location.latitude")
  — parent placeholders can't round-trip through Substrait. Any
  aggregation against an object-field leaf (e.g. stats max(city.location.latitude))
  crashed with "Field [city] not found".

  Use RelDataType.getField, which returns null on miss. A null lookup
  is the correct "not nested" answer. Behavior unchanged for relations
  that do have a top-level ARRAY column.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix spotless

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f006e29.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@ahkcs
Copy link
Copy Markdown
Collaborator Author

ahkcs commented May 11, 2026

Superseded by #5430. Retargeted #5430 to base main so a single PR delivers the merge commit + all 22 feature-branch commits in one shot.

@ahkcs ahkcs closed this May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants